Conversation
This PR implements cross-file tracking of filter selectivity in ParquetSource to adaptively reorder and demote low-selectivity filters. Filters that don't filter enough rows (configurable, default 80% threshold) are demoted from row-level pushdown to post-scan inline application, reducing I/O overhead. Key changes: - Add SelectivityTracker to track filter effectiveness across files - ExprKey wrapper enables HashMap keying by PhysicalExpr structural equality - Each ParquetOpener queries shared stats to partition filters into: - Row filters (push down): filters with effectiveness >= threshold or unknown - Post-scan filters: filters with effectiveness < threshold - Post-scan filters are added to projection, applied inline in stream, then filter columns are removed from output - SelectivityUpdatingStream updates tracker when stream completes - build_row_filter_with_metrics() returns per-filter metrics for tracking - Filters are reordered by observed effectiveness (most selective first) Configuration: - `parquet_options.filter_effectiveness_threshold` (default: 0.8) - Effectiveness = 1 - (rows_matched / rows_total) = fraction filtered out 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix clippy error: Use datafusion_common::instant::Instant instead of std::time::Instant for WASM compatibility (opener.rs:706) - Fix parquet test failures: Set filter_effectiveness_threshold to 0.0 in test helper when pushdown_predicate is enabled. This ensures filters are pushed down immediately rather than waiting for adaptive selectivity learning. - Fix filter_pushdown_view test: Disable pushdown_filters in test config so filters stay as FilterExec nodes rather than being pushed into the Parquet reader. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
run benchmark tpch |
|
🤖 |
This reverts commit 72b078a.
|
run benchmark tpch |
|
run benchmark tpch |
|
🤖 |
|
run benchmarks |
|
🤖: Benchmark completed Details
|
|
🤖 |
This reverts commit 84346da.
|
run benchmark tpch |
|
run benchmarks |
|
🤖: Benchmark completed Details
|
|
🤖 |
This reverts commit d26ceb0.
|
run benchmark |
|
🤖 Hi @Dandandan, thanks for the request (#19694 (comment)).
Please choose one or more of these with |
|
run benchmark tpch |
|
run benchmarks |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
@Dandandan care to shed some light on the current state of this test? I see you've:
What's the combined impact of these changes? I'm having trouble tracking the benchmark results |
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?